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ABSTRACT 


Program  for  calculating  tha  reliability  of 
fault- tolar  ant  systems  do  not  explicitly  taka  Into 
account  tha  affact  of  failuraa  In  tha  hardwara 
recovery  aachanlaa.  This  papar  showa  how  to 
Incorporata  tha  failuraa  in  tha  racovary  aachanlaa 
of  a  alapla  redundant  ayataa,  in  tha  fault-handling 
(coverage)  aodal  of  CARE  III  and  how  to  ealoulate 
tha  required  coverage  paraaatara,  specifically  tha 
probability  that  a  failure  la  not  lethal.  It  la 
alao  ahown  that  CARE  III  glvaa  a  conservative 
eatlaate  of  tha  reliability  of  tha  redundant 
systen. 

1  INTRODUCTION 

Analytical  Models  have  been  developed  to 
eatlaate  tha  reliability  of  computer  systaas . 
These  aodal  a  oan  be  applied  to  a  large  class  of 
fault- tolar  ant  systaas  (Brldgnan  Bt  1  Ug  80).  Tha 
user  aust  calculate  tha  required  paraneters  for 
these  Models.  One  or  these  paraMeters  la  the 
coverage,  tha  conditional  probability  of 
successful  error  recovery  givan  that  an  error  has 
occurred .  Error  recovery  consists  of  error 
detection.  Isolation  and  s/stan  reconfiguration. 
The  sensitivity  of  the  reliability  to  a  snail  error 
In  the  coverage  estlaatlon  is  well  known  (Arnold 
721.  The  reliability  of  the  hardware  responsible 
for  the  error  recovery  has  to  be  taken  Into  account 
(Losq  76]  (Ogue  7*]. 

CARE  III  Is  a  well-known  analytical  nodal  for 
reliability  calculation.  It  has  a  separate  nodal 
for  the  coverage  where  It  la  assuned  that  the 
Isolation  of  a  detected  error  and  the  recovery  fron 
It  will  always  be  successful  (Bavuso  84]  (Geiat  83) 
[Trlvadl  81 1  [Trlvcdl  831.  Thus ,  coverage  in  CARE 
III  consists  only  of  error  detection.  Furthernore, 
CARE  III  cannot  nodal  latent  faults  In  atand-by 
spore  nodules  or  reo ovary  aechanians.  Lstent 
faults  are  faults  that  will  not  generate  errors 
until  a  fault  occurs  In  the  active  nodule. 

In  this  paper,  a  hypothetical  faul ^tolerant 
systen  is  designed  to  point  out  the  difficulties 
encountered  In  the  calculation  of  the  coverage 
paraneter(s).  The  failures  in  the  hardware 
recovery  aachanlaa  are  included  in  the  coverage 
nodal  of  CARE  III. 

In  Sec.  2,  the  systen  is  described.  The 
results  of  logic  slnulation  on  a  Daisy  Megaloglclan 
(CAD  systen)  are  reported.  The  faults  are  divided' 


Into  different  classes  and  sone  examples  are  given 
to  show  the  effect  of  the  faults  in  the  recovery 
nechanlsn  on  the  reliability  of  the  system.  The 
fault  classes  resoable  those  described  in  [Losq 
76].  Only  permanent  faults  are  nodeled. 

In  Sec.  3,  the  coverage  parameters  and  the 
.reliability  of  the  systen  are  calculated  using  the 
nodels  In  CARE  III.  Speeld  attention  is  given  to 
the  calculation  of  or  the  probability  of  a 

failure  not  being  lethal  to  the  systen  (Bavuso  84]. 
The  reliability  calculated  using  CARE  III  is  then 
coopered  to  a  reliability  prediction  obtained  fron 
a  Markov  nodal  specific  to  the  systen  under  study. 
It  Is  shown  that  the  CARE  III  Models  produce  a 
conservative  estlnate  of  the  reliability. 

Throughout  this  paper,  a  failure  will  refer  to 
a  physical  defeot  In  a  component  while  a  fault  will 
be  the  nodal  describing  that  failure.  On  the  other 
hand,  an  error  occurs  when  a  component  per  Toros  its 
function  lncorreotly. 

2  THE  FAULT-TOLERANT  SYSTEM 

The  rault-tolerant  systen  consists  of  two 
Identical  nodules  (X  and  Y).  Nodule  X  is  active 
(l.e.  connected  to  the  bus)  in  the  error- free 
condition.  A  detector  controls  a  switch  that 
connects  nodule  Y  to  the  bus  In  case  of  an  error  in 
X.  So,  Y  Is  a  powered  back-up  spare  in  the  systen. 

Figure  1  shows  the  systen.  It  consists  of  an 
OR  gate  whose  funetlon  Is  replicated  by  a  NAND  gate 
and  two  Inverters.  An  EXCLUSIVE-OR  gate  conpares 
the  outputs  of  the  OR  and  NAND  gates  to  detect  any 
error.  These  two  outputs  are  connected  to  the  bus 
through  two  buffers  with  3-state  outputs.  The 
switch  Is  iaplaaented  by  a  D  latch  that  is 
Initially  set  so  that  nodule  X  is  connected  to  the 
bus.  ir  the  EXCLUSIVE-OR  gate  detects  a 

discrepancy  between  the  outputs  of  the  OR  and  NAND 
gates,  the  switch  disconnects  nodule  X  fron  the  bus 
and  connects  nodule  Y. 

This  systen  oould  be  aide  much  more  reliable  by 
having  two  redundant  switches  as  In  the  Bus 
Guardians  of  the  FTMP  [Hopkins  78).  Also,  both 
nodules  oould  be  systematically  exercised  or 
■flexed”  to  detect  latent  faults.  The  emphusls  in 
this  paper  Is  on  the  calculation  of  the  paraneters 
necessary  for  the  reliability  nodels,  not  on  the 
design  of  reliable  systaas. 

The  fault  model  used  in  this  analysis  will  be  a 
permanent  single  stuek-at  fault  model.  The  faults 
are  divided  Into  five  classes  in  order  to  calculate 
the  coverage  paraneters.  A  DAISY  Megaloglclan  was 
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used  to  perform  the  logic  simulation  of  the  system 
and.  to  datarmlna  the  classification  of  the  faults. 
Three  of  these  classes  correspond  to  faults  In 
modules  X  and  Y: 

1)  Undetectable  faults:  Since  a  fault  in  lead  A, 
for  example,  will  have  the  same  effect  on  both 
inputs  of  the  EXCLUSJVE-OR  (ate,  It  Is  undetectable 
and  the  ayatea  falls  (Incorrect  data  on  the  output 
bus) . 

2)  Detestable  faults:  A  fault  in  lead  E,  for 
example,  ulll  only  affect  one  of  the  inputs  of  the 
EXCLUSIVE-OR  (ate.  It  will  be  detected  and  the 
switch  ulll  oonnect  module  Y  to  the  bus. 

3)  Fatal  faults:  C  s-a-0,  for  example,  ulll  force 
Incorrect  data  on  the  output  bus  and  the  system 
ulll  immediately  fall. 
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Fig  l  Feult-lolarenl  System 


When  module  Y  is  connected  to  the  bus, 
detectable  or  undetectable  faults  (In  mod.  Y)  ulll 
eause  a  aysteo  failure  ( incorrect  data  on  the 
output  bua).  Therefore,  they  do  not  need  to  be 
distinguished  from  each  other.  Also,  If  a  fatal 
fault  oeours  In  module  Y,  the  system  Immediately 
falls  even  if  nodule  X  uas  connected  to  the  bus. 
In  suaoiary,  the  faults  In  module  Y  can  be  grouped 
In  two  classes:  Fatal  and  non-fatal  (non-fatal  » 
detectable  and  undetectable). 

Two  fault  classes  need  to  be  defined  for  the 
recovery  mechanics: 

*)  Faults  eausiag  premature  switching;  J  s-a-1 , 
for  example,  ulll  cause  module  Y  to  be  connected  to 
the  bus  even  though  nodule  X  Is  fault-free. 


5)  Latent  faults:  A  latent  fault  In  the  recovery 
mechanism  ulll  not  produce  an  error  until  a  fault 
occurs  In  the  active  module.  J  s-a-O,  for  example, 
ulll  not  produce  an  error  until  the  FXCLUSIVE-OR 
gate  detects  an  error  In  module  X. 

There  are  also  fatal  faults  In  the  sultch.  If 
lead  P  Is  a-a-O,  for  example,  both  modules  (X  and 
Y)  ulll  be  disconnected  from  the  bus  and  the  system 
falls. 

The  fault  classification  Is  shown  in  Table  1. 
The  faults  are  divided  Into  ten  groups.  Each  row 
in  Table  1  corresponds  to  a  group  and  each  group 
corresponds  to  one  of  the  classes  described  ebove. 
More  than  one  group  can  correspond  to  the  same 
class.  Groups  2,  8  and  10,  for  example,  all 
consist  of  fatal  faults.  The  total  number  of 
faults  in  group  1  is  equal  to  C[i), 

TMl*  I  Fault  Claaalflaation 

Croup  Ct  1  ]  a-a-0  a-a-1  alaaa 

INsOula  II  I  |  a  |  a, I  I  a, a  !  uni-t-ct-vi-  i 

I  I  a  I  I  I  C, 0,01. HI  |  C,D,G1 ,H1  I  fetal  I 

I  t  3  I  zo  I  u.aa.t.r  i  ai.aa.t.r  i  <ataota»ia  i 

I  III  S1.X7.G.H  |  I1.X2.C.H  i  i 

I  -  I  II  02,  K?  |  07. K?  I  I 


lOotoatorl  a  |  t  I 
I  III 

I  I  *  I  I  I. 

I  Sul  tab  I  S  |  J  I 
I  III 


i  |  praaaturo  I 

I  aulUhlax  | 

I  lataat  I 
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I  I  T  I  10  I  K.l.N.M  I  7. 71,7?  |  latent 

I  I  I  IH2.H71  ,K7?  I  I 

I  I  S  I  T  I  7,71,77  I  X.N7.  I  ratal 

I  I  II  I  N71.H77  I 

IHeUule  It  |  |  7a  |  C[»l  .  C[1]  .  C(J]  I  non-fatal 

I  I  10  I  I  I  C(  to]  •  cm  I  fat-1 


3  RELIABILITY  CALCULATION  USING  CARE  III 

CARE  III  Is  a  very  sophisticated  and  powerful 
reliability  model  [Bridgman  8a).  It  can  be  dlvioed 
into  two  parts  :  the  aggregate  model  and  the  fault¬ 
handling  (coverage)  model.  The  latter  describes 
the  recovery  prooess  In  detail.  More  In  Formation 
about  CARE  III  can  be  found  in  [Bavuso  8«]  [Trlvedl 
81].  Figure  2  shows  the  single  fault-handling 
model.  A  fault  (with  rate  f(t))  causes  the  system 
being  modeled  to  go  to  state  A.  The  fault  Is  active 
but  no  error  exists  yet.  The  fault  produces  an 
error  (at  a  rate  r(t'))  and  the  system  goes  from 
state  A  to  state  Ag.  If  the  error  is  not  fatal, 
the  system  ulll  go  to  state  Ap  (  at  a  rate  E(t") 
and  with  a  conditional  probability  c).  If  the 
error  Is  fatal,  the  system  ulll  go  to  state  F.  Both 
t'  and  t"  are  random  variables.  It  Is  assumed  that 
they  follou  the  exponential  distribution.  1/r(t') 
Is  the  average  time  for  a  fault  to  produce  an  error 
and  1/E(ta)  Is  the  average  time  for  that  error  to 
be  detected  (or  cause  a  system  failure).  State  Ap 
Indicates  that  the  error  uas  detected;  It  Is 
assisted  In  CARE  III  that  the  Isolation  of  the  error 
and  the  recovery  from  it,  will  always  be 
successful. 

Only  permanent  faults  will  be  considered  here. 
The  fault-handling  model  should  be  able  to 
represent  all  the  faults  In  the  1-out-of-?  system 
under  study.  The  latent  faults  (J  s-a-0  for 


example)  cannot  b«  handled  Ukt  tha  other  faults. 
A  latent  fault  will  only  affect  the  system  after  an 
error  In  module  X.  The  parameter  t*  (time  for  fault 
to  produce  an  error)  will  be  many  ordera  of 
magnitude  larger  than  that  of  a  detectable  fault  in 
module  X  for  esample.  The  system  could  be  divided 
into  two  subsystems;  1)  Modules  X  and  Y.  2)  The 
recovery  mechanism.  The  double  fault-handling 
model  in  CAKE  HI  [Bavuso  8«]  can  be  used  to 
describe  the  dependence  between  the  faults  in  the 
two  subsystems.  However,  It  ulll  be  impossible  to 
distinguish  the  latent  faults  from  the  rest  of  the 
faults  in  the  recovery  mechanism.  a  solution  for 
this  problem  la  to  divide  the  faults  in  the  system 
into  two  typea:  1)  Latent  faults.  2)  All  other 
faults  in  the  system.  Hence,  the  fault-handling 
siodel  has  to  be  used  twice. 


l.e.  the  transitions  from  fault  (atate  A)  to  error 
(state  Ag)  and  that  from  error  to  recovery  (state 
Ap)  or  system  failure  (state  F) ,  are  almost 
instantaneous. 


Z  •  X  Ctll  rtf)  Elf) 

GH — ©—— >© - <D 

I.  5,7 

fig.  4  nodding  ths  latent  (suits  of  the  system  in  Fig  l 


The  latent  faults  in  tne  recovery  -nechanis"! 
are  treated  separately  as  shown  in  Flp .  u.  The 
parameter  e,  in  this  case,  is  equal  to  zero  because 
any  fault  in  the  system,  while  the  recovery 
mechanism  is  disabled,  will  lead  to  a  system 
failure.  E(t">  la  assigned  a  large  constant  while 
r(t')  is  equal  to  the  transition  rate  between 
states  0  and  A  in  Fig.  3. 

-  The  reliability  is  calculated  as  follows 
[Trivedl  81]: 


A  :  Fault  Is  active 

A  £  ;  Fault  caused  an  error 

Ap  :  Module  has  been  detected  as  laulty 

F  :  System  (allure  state 

Fit)  ;  Module  (allure  rate 

rtf)  ;  Rate  at  which  (suit  produces  an  error 

Elf)  :  Rate  at  which  error  is  delected 

c  :  Probability  that  e  (aulty  module  is  not  lethal 


Fig.  2  CARE  III  Single  Fault-Handling  Model 
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Rellabillty(t)*1-Plbeinc  in  state  F  at  time  t) 


Fly  3  CARE  ill  ipplud  to  lyilim  in  fig  i 

In  Fig.  3,  the  Fault-hand  l  mg  model  is  applied, 
to  the  faults  in  modules  X  and  Y  as  well  as  the 
non-letent  faults  in  the  recovery  mechanism.  The 
parameter  o  (probability  that  the  system  can 
recover  from  the  error)  will  be  equal  to  the  ratio 
of  the  non-fatal  faults  to  the  total  nunber  of 
faults  (non-latent) . 

Sum  C(i]  1*3. 4 .6 .9 

O  a  —  — - 

Sian  CC1J  1*1 ,2,3. 4. 6. H. 9, 10 

Since  the  system  is  very  simple  and  the  clock 
eyole  la  many  ordera  of  magnitude  smaller  than  the 
mean  time  between  faults,  the  parameters  r(t')  and 
E(t")  are  both  aasimted  to  be  large  and  constant, 
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f  Non-felel  fault  In  V  with  "1 


latent  fault  In  switch 


Fig.  6  Reliability  model  specific  to  system  in  Fig  l 


The  reliability  of  the  whole  syst em  will  be  the 
product  of  the  reliabilities  calculated  from  the 
above  two  Models  (Flea.  3  and  o). 

Unrellablllty(t)* 

i-Ci-FcarB#B_i.teI,tnni-pisFUt.ntn 

where  SFnon_^atant  denotea  the  event  of  e  system 
failure  due  to  a  non-latent  fault  and  3F^at#nt 
denotea  the  event  of  a  system  failure  due  to  a 
latent  fault.  Let  2 z  be  the  fellure  rate  of  any 
(ate,  latch,  or  fanout  branch  In  the  circuit  and 
assume  that  the  »-a-0  and  »-a-1  faults  are  equally 
likely,  each  with  a  failure  rate  of  z.  The 

unreliability  la  shown  In  Fig.  5  for  z  *  0.0001  / 
10“  hours  [Nil  82]  along  with  a  plot  of  an 
unreliability  prediction  for  the  aame  system 
oelculated  from  a  Harkov  model  specific  to  the 
ayatem  under  study  (Fig.  6). 

5  SUNN ART  and  CONCLUSIONS 

A  very  simple  redundant  system  was  designed. 
It  consists  of  two  Identical  modules,  one  active 
and  the  other  a  powered  beck-up  spare.  The 
recovery  aeohanlam  consists  of  e  detector  and  a 
switch.  The  faults  were  divided  Into  five  classes 
assisalng  single  stuck-et  fault  model.  The 
classlfloatloo  of  the  faults  was  determined  by  a 
Delay  Megslof lolan.  The  five  fault  classes  were 
then  used  to  determine  the  coverage  parameters  for 
the  fault-handling  model  In  CARE  Ill.  Even  though 
only  one  type  of  failure  (permanent)  was  modeled. 
It  was  found  that  the  fault-handling  (coverage) 
model  had  to  be  used  twice  to  aocount  for  the 
latent  failures  in  the  recovery  mechanism.  Each 
time  the  model  was  used,  a  different  set  of 
coverage  parameters  was  calculated. 

A  reliability  model  specific  to  the  aystem,  was 
also  built  and  the  results  were  compared  to  those 
obtained  with  the  models  In  CAKE  III.  It  was  found 
that  CARE  III  gives  a  conservative  estimate  of  the 
reliability  of  the  system. 

In  conclusion,  the  failures  in  the  recovery 
mechanic  made  It  necessary  to  "adapt*  CARE  III  In 
order  to  aocurately  represent  the  system.  Even 
though  CARE  III  does  not  distinguish  between  active 
and  stand-by  spare  modules.  It  gives  a  conservative 
estimate  of  the  reliability  of  a  stand-by  redundant 
system. 
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ABSTRACT 


This  paper  shows  that  failures  caused  by 
power  supply  disturbances  can  be  modeled  as  delay 
faults.  This  conclusion  results  fren  experiments 
where  voltage  sags  are  Injected  in  the  power  supply 
rails  of  gate  arrays  and  breadboard  circuits.  The 
susceptibility  of  the  circuits  to  the  occurrence  of  - 
errors  increases  with  clock  frequency.  This 
dependency  can  be  attributed  to  the  Increase  in 
propagation  delay  with  lower  power  supply  voltages. 
Errors  are  caused  by  violation  of  timing 

constraints  of  the  circuits.  It  was  also  found 
that  supply  disturbances  can  cause  metastablllty.- 

1  IRTBODUCTION 

Power  supply  disturbances  are  known  to  cause 
errors  in  the  operation  of  digital  systems.  In  the 
literature  ([Allan  833  [Chesney  831),  the 
susceptibility  of  circuits  to  power  supply 
disturbances  has  been  characterized  by  measurements 
where  logic  gates  with  constant  input  signals  have 
their  power  supply  disturbed.  By  using  constant 
inputs,  the  important  effect  of  power  disturbances 
on  propagation  delay  is  underestimated  and  noise 
Immunity  problems  are  assiaed  to  be  the  only  cause 
of  errors.  This  is  not  a  reasonable  assumption  in 
systems  where  logic  signals  are  changing  with  time. 
Experimental  results  show  that  propagation  delay 
variation  la  the  dominant  affect  and  that  noise 
immunity  plays  a  small  role  In  error  oocurrence. 
This  paper  shows  that  failures  caused  by  power 
supply  disturbances  can  be  modeled  as  delay  raults. 
Experiments  were  performed  on  a  CMOS  gate  array  and 
breadboard  circuits  implemented  with  7*»HC  and  74LS 
catalog  parts.  It  was  found  that  the 

susceptibility  of  the  circuits  to  power  supply 
voltage  disturbances  is  related  to  the  operating 
frequency.  Errors  are  more  likely  to  occur  as 
operating  frequency  increases.  In  the  breadboards, 
the  error  mechanism  was  observed  directly  by 
monitoring  voltage  waveforms  at  Internal  nodes.  It 
was  found  that,  in  most  cases,  errors  were  caused 
by  violation  of  timing  constraints  due  to  the 
Increase  of  gate  propagation  delay  during  the 
disturbances.  In  the  gate  array.  Internal 
waveforms  could  not  be  observed  directly.  In  order 
to  analyze  the  Internal  behavior,  logic  simulation 
was  performed  on  a  gate  array  model.  All  output 
waveforms  observed  experimentally  were  successfully 
reproduced  by  the  logic  simulation,  confirming  that 


errors  were  caused  by  delay  effects.  In  another 
experiment,  a  metastablllty  detector  was  built  in 
order  to  analyze  the  output  waveforms  of  the 
circuits  under  test.  It  was  found  that  power 
supply  disturbances  can  cause  metastablllty.  The 
circuits  implemented  for  the  experimental  work  are 
described  in  Sec.  2.  The  experimental  procedure 
and  data  analysis  is  presented  in  Sec.  3  followed 
by  the  conclusions  in  Sec.  A.  Preliminary  results 
on  power  supply  disturbances  are  presented  in  [Lu 
84]  and  [Cortes  85]. 

2  DESCRIPTION  OF  THE  EXPERIMENTAL  SYSTEM 

The  circuits  utilized  in  the  experiment  are 
the  detector  chip,  the  detector  breadboards  and  the 
metastablllty  detector.  A  description  of  the  chip 
and  the  hardware  utilized  in  the  experiment  can  be 
found  in  [Lu  84]  and  [Wakerly  82],  A  brief 
description  of  the  circuits  is  presented  next. 

2.1  DETECTOR  CHIP.  This  chip  was  proposed  by 
[McCluskey  81  ].  Its  sole  purpose  is  to  monitor 
itself  for  temporary  errors.  In  this  experiment, 
the  detector  chip  used  is  a  CMOS  gate  array 
fabricated  by  STC  (Storage  Technology  Corporation) . 
Figure  1  shows  the  elements  of  the  detector  chip 
and  their  Interconnections.  The  basic  cell  is  a 
set  of  three  XORs  wired  in  such  a  way  that  it  has 
the  following  interesting  properties:  the  complete 
test  set  consists  of  all  combinations  of  even- 
weight  3-bit  vectors;  any  even-weight  input  vector 
produces  the  same  vector  at  the  output;  any  odd- 
weight  input  vector  produces  an  even-weight  output 
vector;  therefore  when  the  complete  test  set 
(Fig.  1)  is  applied,  the  output  lines  of  the  cell 
match  their  Inputs.  Fifteen  basic  cells  are 
cascaded  to  form  a  ehala.  If  the  circuit  is 
error-free  the  output  of  a  chain  is  a  delayed 
version  of  the  input  patterns.  A  set  of  Flip-Flops 
is  added  to  synchronize  the  signals.  An  equality 
checker  built  out  of  3  XNORs  and  an  AND  gate 
provides  an  active-low  error  signal  denoted  LAE  for 
Look-Ahead  Error. 

2.2  DETECTOR  BREADBOARDS.  In  order  to  permit 

the  observation  of  Internal  signals,  a  discrete 
version  of  the  detector  chip  was  built.  It 
consists  of  one  module  made  of  10  basic  cells 
(Fig.  2).  A  clock  generstor,  a  3-channel 

programmable  pattern  generator  and  a  voltage  level 
tranalator  (open  collector  inverters  54S05)  are 
also  Included  on  the  same  board.  The  power  supply 
disturbances  are  applied  to  the  circuit  under  test 
(Fig.  2).  The  rest  of  the  board  is  supplied  with  5 
V  DC.  The  clock  and  pattern  generators  are 


implemented  with  74LS  parts.  The  portion  of  the 
board  that  is  subjected  to  disturbances  (circuit 
under  test.  Fig.  2)  was  Implemented  with  74HC 
catalog  parts  In  the  CMOS  breadboard  and  with 
74LS  catalog  parts  In  the  LSTTL  breadboard. 
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Fig.  1 :  The  Detector  Circuit 
And  Its  Building  Blocks 


Fig.  2:  The  Oetector  Breadboard 


2.3  METASTABILITT  DETECTOR.  This  circuit  Is 
actually  a  late  transition  detector  and  was  based 
on  a  circuit  suggested  by  Greg  Freeaan  (Fig.  3) 
[Freeaan  853.  An  error  signal  Is  generated 

whenever  a  data  transition  occurs  after  DLYCLK 
rising  edge.  The  counter  contents  Indicate  the 
error  rate.  This  circuit  was  Implemented  In  hlgh- 
performance  CMOS  parts  (74HC) .  This  Is  an 
appropriate  scheme  to  detect  metastablllty  when  It 
Is  Impossible  to  access  the  output  of  the 
metastable  memory  element.  This  is  the  case  for 
the  detector  chip  and  the  breadboard  detectors 
where  all  signals  are  buffered.  When  metastable 
signals  (metastable  voltage  levels)  drive  the 
buffers,  their  outputs  are  likely  to  be  observed 
externally  as  valid  logic  signals  because  of  the 
high  gain  of  the  buffers.  Half  of  the 

metastablllty  occurences  result  In  late  transitions 
at  the  buffer  output. 


3  EXPERIMENTAL  RESULTS  AMD  ANALYSIS 

In  this  section  the  various  experimental 
procedures  are  presented  and  the  data  Is  analyzed. 
A  sequence  of  different  experiments  was  performed 
on  both  the  detector  chip  and  detector  breadboards. 

3.1  DETECTOR  CHIP.  Two  types  of  disturbances 
were  applied:  DC  and  Pulsed  (negative  pulses).  The 
susceptibility  of  the  chip  to  disturbances  was 
measured  by  Increasing  the  magnitude  of  the 
disturbance  until  the  first  error  was  observed  at 
the  LAE  output  pin.  The  magnitude  of  the 
disturbance  Is  denoted  &VDD.  Low  values  of  ^ VDD 
Indicate  poor  tolerance  to  disturbances. 
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Fig.  3:  The  Metastablllty  Detector 


DC  vs  Pulsed.  A  first  set  of  experiments 
was  conducted  to  study  the  effect  of  duration  of 
the  disturbances  on  error  occurrence.  It  was  fouid 
that  aVdd  Is  Insensitive  to  pulse-width  variations 
(1,  2,  4  and  7  microsec.  were  used).  Furthermore, 
the  sane  value  of  aVd„  was  measured  for  DC 
disturbances.  This  Is  not  surprising  since 
propagation  delays  and  transients  are  much  shorter 
than  1  microsecond.  ^VDD  may  exhibit  a  non- 
negllglble  sensitivity  to  pulse  width  for  very 
short  duration  disturbances  (100  nsec  or  shorter). 
However,  voltage  sags  with  such  short  duration  are 
very  rare  [Key  783.  Therefore,  it  was  reasonable 
to  limit  the  experiments  to  DC  disturbances  only, 
and  extend  the  results  to  typical  pulsed 
disturbances.  The  remaining  experiments  were 
conducted  using  DC  only. 

DC  Dlaturbameea.  Figure  4  shows  the 
variation  of  the  tolerance  to  disturbance  (^VDD) 
with  clock  frequency  for  the  CMOS  gate  array.  The 
nominal  aupply  voltage  Is  5  V.  One  can  Identify  two 
regions  In  the  plot:  1)  a  flat  ration  (&VDQ  *  3.4 
V)  for  frequencies  below  2.5  MHz,  that  shows  a  weak 
dependency  on  frequency,  and  2)  a  region  where  the 
tolerance  to  disturbances  decreases  monotonlcally 
with  frequency  (from  2.5  MHz  up  to  12  MHz)  that 
will  be  referred  to  as  the  freqmeaey-depamdent 
region. 

At  low  frequencies  (flat  region)  It  was 
originally  thought  that  errors  occurred  only 
because  of  noise  margin  violations:  VDD  was  too  low 
to  guarantee  the  VIH,  VJL  for  the  CMOS  gates. 
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Thar*  is  now  evidence  that  other  effects  nay 
contribute  to  this  behavior  as  well  (Section  3.2). 

In  the  frequency-dependent  region,  delay 
effects  were  dominant.  Errors  were  caused  by 
violations  of  timing  constraints  due  to  the 
Increase  of  gate  propagation  delay  during  the 
disturbances.  For  exaaple,  at  *0  MHz  and  for 

disturbances  larger  than  2.3  V  (Fig.  <0,  the  delay 
through  the  combinational  logic  of  the  detector 
chip  (the  chain  in  Fig.  1)  is  longer  than  the  clock 
period  (100  nsec.).  The  Increase  in  propagation 
delay  with  lower  supply  voltages  occurs  In  CMOS  and 
LSTTl  logic  and  is  discussed  In  (Wagner  85]. 


Vdd  -AVdd  (V) 


Fig.  4:  Tolerance  To  Power  Supply 
Disturbances  vs  Frequency 
tor  the  Oats  Array 


Internal  nodes  of  the  gate  array  could  not  be 
observed  directly.  In  order  to  confirm  that  the 
observed  output  waveforms  resulted  from  delay 
affects,  a  logic  simulation  model  of  the  gate  array 
was  Implemented  on  a  Megaloglclan  Logic  Simulator 
(Daisy  Systems  Corp.),  In  the  simulation  model, 
the  gates  have  a  constant  delay,  the  clock 
frequency  Is  varied  and  errors  occur  because  of 
timing  constraint  violations.  This  Is  equivalent 
to  having  the  clock  frequency  fixed  and  varying  the 
gate  delay  parameter  In  the  logic  simulation  model. 
In  this  way,  the  external  behavior  of  a  circuit- 
level  dependency  (delay  vs  power  supply)  can  be 
studied  with  a  logic  level  equivalent  model  (timing 
constraint  violation).  The  analysis  of  the  circuit 
was  substantially  simplified  by  the  use  of  the 
logic  simulation  tool.  All  the  output  waveforms 
observed  experimentally  were  successfully 
reproduced  by  the  logic  simulation.  This  confirms 
that  transient  errors  due  to  power  supply 
disturbances.  In  the  frequency-dependent  region, 
can  be  modeled  as  delay  faults. 

3.2  CMOS  AMD  LSTTL  BREADBOARD  DETECTORS. 
Similar  experiments  were  performed  on  breadboard 
versions  of  the  gate  array:  the  CMOS  Bread  Board 
(74HC  catalog  parts)  and  the  LSTTL  BreadBoard 
(74LS  TTL  catalog  parts).  The  breadboards  allow 
reconfiguration  of  the  chain  (changes  In  chain 
length)  and  easy  observation  of  the  error 
mechanisms.  The  results  are  presented  In  Fig.  5 
for  the  CMOS  breadboard  and  In  Fig.  6  for  the  LSTTL 
breadboard.  The  data  points  labeled  10  stages,  * 
stages  were  obtained  on  OIOS  breadboards  having 
different  chain  lengths  (number  of  cells);  the 


disturbance  magnitude  (aVqq)  was  increased  and  the 
error  output  LAE  was  observed  for  errors.  In  the 
data  points  labeled  Input  Error,  avdd  Is  the 
disturbance  magnitude  to  cause  an  Incorrect  vector 
at  SO  (Chain  Input  In  Fig.  2).  The  CMOS  and  LSTTL 
breadboards  exhibit  a  similar  behavior  to  the  gate 
array  (Fig.  4). 


Vdd  -AVdd  (V) 


Fig.  S:  Tolerance  to  Power  Supply 
Disturbances  vs  frequency 
lor  the  CMOS  breadboard 


Vdd  .AVdd  (V) 


Disturbances  va  Irequency 
lor  the  LSTTL  breadboard 


The  error  mechanism  was  observed  by  probing 
the  breadboards.  Increases  In  disturbance 

magnitude  (aV^)  caused  the  signal  transitions  at 
the  chain  output  (S10  In  Fig.  2)  to  approach  the 
clock  edge.  An  error  was  detected  by  LAE  when  set¬ 
up  time  was  violated,  at  S10.  This  observation  led 
to  the  conjecture  that  metastablllty  can  occur. 

The  shape  of  the  curves  depend  on  the 
variation  of  delay-per-gate  with  supply  voltage  and 
length  of  the  longest  delay  path  In  the  system. 
When  the  chain  length  in  the  CMOS  breadboard  was 
changed  from  10  to  4  stages  (Fig.  5),  the  tolerance 
to  disturbances  Increased  from  1.20  V  to  2.57  V  (at 
8  MHz) . 

In  the  flat  region,  as  the  disturbance 
Increased,  erroneous  input  vectors  appeared  at  the 
chain  Input  (SO,  Fig.  2).  This  erroneous  Injection 
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was  responsible  for  the  shape  of  the  flat  region  in 
the  plots.  This  was  caused  by  timing  degradation 
in  the  CMOS  breadboard  and  noise  immunity  problems 
in  the  LSTTL  breadboard. 

3.3  METAST  ABILITY .  Measurements  with  the 

CMOS  breadboard  (Section  3.2)  showed  that  the 
conditions  for  metastability  were  present.  For 
some  combined  frequency  and  voltage  conditions, 
set-up  time  violations  were  observed.  The  goal  of 
this  experiment  was  to  detect  metastabillty  using 
the  circuit  described  in  Section  2  (Fig.  3).  No 
attempt  was  made  to  fully  characterize  the 
metastabillty.  Metastability  ms  detected  in  both 
the  CMOS  breadboard  and  the  detector  chip.  For 
simplicity,  only  the  CMOS  breadboard  experiment  is 
described  here.  The  signals  labeled  Xout,  Tout> 
Zout  (Fig.  2)  were  connected  (one  at  a  time)  to  the 
data  input  of  the  metastabillty  detector  (Fig.  3). 
Both  circuits  used  the  same  clock  signal.  The 
window  in  Fig.  3  represents  the  minimum  duration  of 
metastable  states  which  would  cause  an  error  count. 
For  a.O  MHz  and  Vpp  ■  2.32  V  metastabillty  was 
detected.  Note  that  this  voltage  is  precisely 
for  4.0  MHz  in  Fig.  5.  The  occurrence  was  very 
sensitive  to  voltage  and  frequency.  Any  drop  in 
the  power  supply  voltage  caused  the  condition  to 
disappear.  The  error  rates  observed  (late 
transitions  count)  are  shown  in  Table  1. 


TABLE  1:  Metastabillty  Error-Rate  versus  Duration 


window  (nsec) 

rate  (late  transitions  /  sec) 

40 

very  high 

60 

320 

80 

2 

100 

_ 2 _ 

This  experiment  shows  that  metastabillty  can 
occur  in  fully  synchronous  systems  as  well. 


4  SUNMAIT  AND  CONCLUSIONS 

One  of  the  consequences  of  the  power- 
supply/delay  dependency  is  that  computer  systems 
designed  with  tight  timing  will  tolerate  only  very 
small  power  supply  disturbances.  As  an  example, 
suppose  the  CMOS  breadboard  is  operating  with  a  10% 
delay  margin  at  5  V  /  9.3  MHz.  It  will  tolerate  a 
max  lain  voltage  dip  of  0.5  V  (Fig.  5).  This 

tolerance  may  be  further  reduced  by  other  factors 
like  parameter  variations,  temperature  variation, 
noise,  etc.  In  applications  where  speed  is  not 
critical,  the  lowest  clock  frequency  to  meet  the 
specifications  should  be  used,  thereby  improving 
the  tolerance  to  power  disturbances.  Digital 
systems  used  for  process  control  in  industrial 
plants  are  examples  of  such  systems.  When  general 
purpose  computer  systems  are  used  they  are  exposed 
to  unnecessary  risk  because  usually  the  clock 
frequency  is  factory  preset  to  obtain  high 
performance  (tight  timing).  Evidence  of 

metastabillty  was  a  byproduct  of  the  experiments. 
It  was  shown  that  metastabillty  can  occur  in  a 
fully  synchronous  environment.  The  experimental 
results  and  conclusions  presented  here  can  be 
extended  to  other  logic  families  with  supply- 


voltage  /  propagation-delay  dependency  [Wagner  35). 
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ABSTRACT 

A  concurrent  built-in  logic  block  observation 
(CBILBO)  technique  for  on-chip  or  on-board  self¬ 
test  is  presented  In  this  paper.  This  technique  is 
derived  by  combining  the  scan  and  BILBO  techniques 
together.  It  allows  test  pattern  generation  and 
response  analysis  to  be  performed  simultaneously 
during  self-testing.  With  this  approach,  test  time 
is  reduced  to  one-half  compared  to  the  BILBO 
approach,  and  is  much  less  than  with  the  scan 
approach. 

1  INTRODUCTION 

Design  for  testability  (DFT)  techniques  [HcCluskey 
84b]  can  be  used  to  reduce  test  cost  of  VLSI/board 
circuits.  Two  DFT  techniques  that  are  commonly 
used  are  scan  testing  [McCluskey  84b]  [Williams  73] 
[Elchelberger  77]  and  built-in  self-test  [McCluskey 
85  a]  [Me Cl u^k«y  85b]  [Wang  85]. 

Scan  testing  requires  that  every  D  flip-flop  (D-FF) 
and  D-latch  be  reconflgurable  Into  a  scan  path  D-FF 
(SPFF)  [Williams  73]  [McCluskey  81]  or  an  LSSD 
shift  register  latch  l SSL)  [Elchelberger  77].  Test 
patterns  and  output  responses  are  scanned  In  and 
out  of  the  chip  or  board  during  test  mode.  With 
this  approach,  sequential  logic  can  be  transformed 
into  combinational  logic  and  test  logic  is  self- 
testable.  However,  test  time  can  be  very  long  due 
to  the  serial  scan  in-and-out  property. 

Built-In  self-test  (BIST)  requires  that  test 
patterns  be  applied  using  on-chlp  or  on-board  test 
pattern  generators  (TPGs)  and  output  responses  be 
compacted  to  a  single  word  (signature)  using  output 
response  analyzers  (ORAs).  The  TPG  can  be  a  binary 
counter,  a  syndrome  driver  counter  (Barzllal  81],  a 
constant  weight  counter  [McCluskey  84a],  a  built-in 
logic  block  observer  (BILBO)  [Konemann  80],  or  a 
linear  feedback  shift  register  (LFSR)  [Wang  86]. 
The  ORA  Is  usually  a  multi  pie- In  put  or  parallel 
signature  analyzer  (MISA)  [Konemann  80]  which  is  an 
LFSR  with  an  Excluslve-OR  (XOR)  gate  placed  at  the' 
data  input  of  each  SPFF  or  SRL  [Benowitz  751 
[Frohwerk  77].  The  TPG  and  the  HISA  can  be 
reconfigured  from  the  corresponding  input  and 
output  registers  of  the  circuit  under  test 
(McCluskey  81],  respectively.  A  BILBO  can  also  be 
employed  to  serve  the  above  purposes.  If  BILBOs 
are  used,  to  achieve  1001  single  stuck  fault 
coverage,  consecutive  logic  blocks  must  be  either 
tested  at  different  times  or  tested  alternately  as 
described  In  [Williams  831. 


*  L.T.  Wang  la  also  with  Digital  Design  Automation, 
Daisy  Systems  Corp.,  Mountain  View,  CA  94039. 


This  paper  presents  a  new  self-testing  scheme  so 
that  test  time  can  be  reduced  to  one-half  compared 
to  the  BILBO  approach.  The  concurrent  BILBO 
(CBILBO)  [Wang  85]  is  constructed  in  such  a  manner 
that,  during  self-testing,  consecutive  logic  blocks 
are  simultaneously  tested.  If  exhaustive  or 
pseudo-exhaustive  test  patterns  [McCluskey  84a] 
[Wang  86]  are  used,  the  CBILBO  can  achieve  1001 
single  stuck  fault  coverage. 

2  BUILT-IN  LOGIC  BLOCK  OBSERVER  (BILBO) 

The  structure  described  In  [Konemann  80]  applies  to 
circuits  that  can  be  partitioned  into  Independent 
modules  (logic  blocks).  Each  module  is  assuned  to 
have  its  own  Input  and  output  registers,  or  such 
registers  are  added  to  the  circuit  where  necessary. 
The  registers  are  redesigned  so  that  for  test 
purposes  they  can  act  as  either  autonomous  LFSRs 
(TPGs)  for  test  generation  or  MISAs  for  signature 
analysis.  The  redesigned  register  is  called  a 
BILBO  (Built-In  logic  block  observer). 

The  BILBO  is  operated  in  four  modes:  normal  mode, 
reset  mode,  test  generation  or  signature  analysis 
mode,  and  scan  mode.  This  technique  Is  most 
suitable  for  circuits  that  can  be  partitioned  so 
that  Input  and  output  registers  of  the  resulting 
modules  can  be  reconfigured  Independently.  If 
consecutive  modules  have  to  be  tested 
simultaneously,  since  the  test  generation  and 
signature  analysis  modes  cannot  be  separated,  the 
signature  data  from  the  previous  module  must  be 
used  as  test  patterns  for  the  next  module.  In  this 
case,  a  detailed  simulation  Is  required  In  order  to 
achieve  1001  single  stuck  fault  coverage. 

3  MODIFIED  BILBO  (MBILBO) 

One  technique  that  overcomes  the  above  BILBO 
problem  Is  described  In  [McCluskey  81].  It  uses  an 
additional  control  Input  to  separate  test 
generation  mode  from  signature  analysis  mode,  and 
eliminates  propagation  gate  delay  by  Integrating 
the  additional  circuitry  Into  the  original  flip- 
flop  design. 

Such  a  modified  BILBO  (MBILBO)  design  Is  shown 
In  Fig.  1.  It  is  a  revised  version  of  the  design 
given  In  [McCluskey  81],  where  only  three  modes  of 
operation  are  considered:  normal  mode,  test 

generation  mode,  and  signature  analysis  mode.  The 
modification  is  obtained  from  the  original  BILBO  by 
adding  one  more  OR  gate  to  each  Z1  Input.  The 
control  Input  B3  Is  always  set  to  0  except  when  the 
register  has  to  be  configured  into  a  TPG.  In  that 
case,  B3  Is  set  to  1.  Fig.  2  shows  an  MBILBO  one¬ 
cell  structure  which  integrates  the  additional 


circuitry  into  a  D  flip-flop.  With  this  approach, 
all  HB1LB0  Inputs  are  effectively  placed  in 
parallel,  and  thus  no  additional  gate  delay  Is 
Introduced.  There  may  be  some  decrease  In  speed 
due  to  increased  loading. 

B1  B2  S3  Operation  Mode 

1  1  0  Normal 

0  1  0  Reset 

10  0  Signature  analysis 

1  0  1  Test  generation 

0  0  0  Scan 


SCK  Q1 


Q2  Scan-Out/03 


Figure  1.  A  3-stage  modified  BILBO  (MBILBO). 


use  of  the  concurrent  BILBO  (CBILBO)  approach 
presented  here  can  reduce  test  tloe  to  one-half. 

The  CBILBO  combines  the  design  of  a  TPG  and  an  HISA 
together.  It  generates  test  patterns  and  compacts 
output  responses  simultaneously  during  self¬ 
testing.  The  CBILBO  can  be  Implemented  using 
either  D  flip-flops  (D-FFs)  or  D-latches, 

Fig.  3  shows  a  3-stage  CBILBO  using  D-FFs.  Each 
CBILBO  stage  consists  of  signature  logic  (including 
an  AND  gate  and  an  XOR  gate),  a  D-FF,  and  a  two- 
port  D-FF.  The  top  3  D-FFs  and  signature  logic 
constitute  a  3-stage  MISA  and  the  bottom  two-port 
D-FFs  constitute  a  3-stage  TPG. 


Bl  B2 


Operation  Mode 


Test  generation  and  Signature  analysis 


Scan- Out 


I 


CK  81 


(A)  One-cel  structure  (8)  Gate- level  design 

Figure  2.  An  MBILBO  one-cell  structure. 

*  COaCUilENT  BILBO  (CBILBO) 

In  the  MBILBO  approach,  system  registers  (either 
input  or  output  registers)  are  reconfigured  into 
either  TPGs  or  MISAs,  but  not  both  at  the  same 
time.  To  achieve  100*  single  stuck  fault  ooverage, 
this  requires  that  consecutive  modules  be  tested  at 
different  times  or  tested  alternately.  For 
circuits  where  test  time  is  a  critical  parameter, 


Scan-In  B2  SCK  Q1  Q2  Q3 

Figure  3.  A  3-0 age  concurrent  BILBO  (C8ILBO)  usino  D-FFs 

This  D-FF  type  CBILBO  is  controlled  by  two  control 
Inputs,  Bl  and  B2.  When  B2c0,  the  circuit  is 
operated  in  normal  mode.  It  functions  as  a 
parallel  read-ln  register  with  the  Inputs  Z1  gated 
directly  into  the  two-port  D  flip-flops.  When  B1*1 
and  B2*1,  the  register  Is  configured  into  a  serial 
read-ln  shift  register.  Test  data  can  be  scanned- 
ln  via  the  serial  input  port  or  scanned-out  via  the 
serial  output  port.  Setting  B1*0  and  B2*1  converts 
the  register  into  a  combination  of  a  TPG  and  an 
MISA.  Fig.  4  shows  a  CBILBO  one-cell  structure 
which  integrates  the  additional  circuitry  into  the 
flip-flop.  With  this  approach,  no  additional  gate 
delay  is  Introduced  although  some  speed  degradation 
may  still  exist. 

Fig.  5  shows  a  3-stage  CBILBO  using  D-latches.  Its 
functions  are  controlled  by  four  non-overlapping 
docks  (SCK,  TCK,  QC2  and  CK3)  and  two  control 
inputs  (Bl  and  B2).  Each  CBILBO  one-cell  structure 
consists  of  signature  logic  (including  two  AND 
gates  and  one  XOR  gate),  and  three  D-latches  (LI, 
L2  and  L3).  The  LI  and  L2  latches  act  as  one  TPG 


stage  (a  two-port  D-FF)  to  generate  test  patterns 
to  the  next  circuit  under  test  (CUT).  The 
signature  logic  and  the  LI  and  L3  latches  act  as 
one  MISA  stage  (a  D-FF)  to  compact  output  responses 
froo  the  previous  CUT.  Fig.  6  shows  a  gate-level 
design  of  the  C8ILB0  one-cell  structure  using  D- 
latches.  The  structure  Is  very  similar  to  the 
stable  SRL  (SSRL)  for  Interfacing  LSSD  logic  to 
non-LSSD  logic  [DasGupta  81]. 


Clocks  B1  B2  Operation  Mode 

SCK/CK2  1  0  Normal 

TCK/CK2  0  1  Scan 

SCK/CK2/CK3  0  0  Reset 

SCK/CK3/TCK/CK2  1  1  Test  generation  and  Signature  analysis 


Scan-Out/Q3 


Figure  5.  A  3-stage  concurrent  BILBO  (CBILBO)  using  D-latches. 


(A)  One-cel  structure  (8)  Gate-level  design 

Figure  4.  A  CBILBO  one-cell  structure  using  D-FFs. 

In  summary,  the  D-latch  type  CBILBO  requires  three 
additional  non-overlapping  clocks  compared  to  the 
D-FF  type  CBILBO.  However,  unlike  the  CBILBO  which 
uses  D-FFs  and  has  an  Inevitable,  essential  hazard 
(He  Cl  us  key  86],  this  CBILBO  using  D-latches  is 
hazard- free. 

5  SUMMARY  AND  CONCLUSIONS 

A  modified  BILBO  (MBILBO)  design  is  first 
presented.  Compared  to  the  original  BILBO 
(Konemann  80],  this  MBILBO  separates  test 
generation  mode  from  signature  analysis  mode,  and 
Is  thus  suitable  for  built-in  self-test.  By 
Integrating  the  additional  circuitry  into  the 
system  registers,  no  additional  gate  delay  Is 
Introduced.  Design  of  two  kinds  of  concurrent 
BILBOs  (CBILBOs)  using  D  flip-flops  (D-FFs)  and  D- 
latches  are  then  discussed.  Both  CBILBOs  further 
reduce  test  time  to  one-half  compared  to  the  BILBO 
or  MBILBO  approach. 


(A)  One-cell  structure 


(B)  Gate-level  design 

Figure  6.  A  CBILBO  one-cell  structure  using  D-latches 
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Table  1  coapares  various  design  approaches.  It  can 
be  saan  that  the  proposed  CB1LB0  (either  D-FF  type 
or  D-latch  type),  Ilka  the  LSSD  SRL,  the  stable  SRL 
(SSRL)  (DasGupta  81].  and  the  BILBO,  has  sone 
disadvantages.  For  example,  It  needs  aore  pins;  It 
Increases  hardware  overhead;  and  It  nay  Introduce 
additional  gate  delay  during  normal  operation. 


ability,"  Proc.,  Hth  Design  Automation  Conf., 
pp.  062-468,  Las  Vegas,  Nevada,  June  1977. 

[Frohwerk  77]  Frohwerk,  R.A.,  "Signature  Analysis: 
A  New  Digital  Field  Service  Method,"  Hewlett- 
Packard  Journal,  pp.  2-8,  Hay  1977. 


Although  there  Is  slight  perforaance  degradation 
due  to  the  addition  of  extra  gate  delay  In  the  D- 
latch  type  CBILBO,  both  CBILBOs  offer  aany  distinct 
features.  First,  they  have  all  the  benefits 

offered  by  the  scan  approach.  For  ex  ample,  any 
sequential  logic  can  be  transformed  into 
coablnatlonal  logic  and  test  logic  is  still  self- 
testable.  Secondly,  they  allow  Interfacing  LSSD 
logic  with  non-LSSD  logic  as  offered  by  the  SSRL 
approach.  Thirdly,  neither  software  test 

generation  nor  fault  simulation  are  required  as  in 
the  BILBO  approach.  Finally,  test  time  Is  auch 
less  than  with  the  scan  approach,  and  can  be  one- 
half  the  test  time  of  the  BILBO  or  MBILBO  approach. 


[Koneoann  80]  Konamann,  B.,  J.  Nucha,  and  G. 
Zwiehoff,  "Built-In  Test  for  Coaplex  Digital 
Integrated  Circuits,"  IEEE  J.  Solid-State 
Circuits,  pp.  315-318,  June  1980. 


(HcCluskey  81]  HcCluskey,  E.J.,  and  S.  Bozorgul- 
Nesbat,  "Design  for  Autononous  Test,"  IEEE 
Traaa.  on  Circuits  and  Systems,  pp.  1070-1079, 
Nov.  1981. 
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Tabic  1.  CBILBO  comparison  with  other  designs. 


Stable  D-FF  D-Latch 


Parameter  vs.  Type  (Match  SRL  BLBO  MBILBO  SRL  CBILBO  CBILBO 


Pin  Count 


Clock/Test  Pins 


Hardware  Overhead* 
Extra  Gate  Delays 
Test  Data  Storage  * 
Test  Times* 


* :  Values  arc  given  by  ratio.  N:  Number  of  test  patterns  for  each  scan  path. 
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POWER  SUPPLY  DISTURBANCES  are  known  to  cause  won  in  the 
operation  of  digital  systems.  In  the  literature 1  ’J  the  susceptibility  of 
circuits  to  power  supply  diaturbancea  (PSD)  haa  been  characterized  by 
measurements  where  logic  galea  with  eonatant  input  signal*  have  their 
power  auppiy  disturbed.  By  uaing  constant  inputs,  the  important  effect 
of  power  disturbances  on  propagation  delay  is  underestimated  and 
noise  enm unity  problems  are  assumed  to  be  the  only  cause  of  errors. 

In  systems  where  logic  sagnala  are  changing  with  time,  this  is  not  a 
reasonable  assumption.  Experimental  results  show  that  propagation 
delay  variation  is  the  dominant  effect  and  that  noise  immunity  plays 
a  small  role  in  error  occurrence.  It  will  be  shown  in  this  paprr  that 
failures  caused  by  power  supply  diaturbancea  can  be  modeled  as  delay 
faults. 

The  subjects  of  the  experiments  were:  a  CMOS  pte  array**  and 
discrete  versions  of  the  pte  array  implemented  with  catalog  part  a***. 
Figure  1  shows  the  logic  diagram  of  the  circuit  used  in  the  experiment 
(detector  circuit)1.  It  constats  of  cascaded  basic  cells,  synchronising 
dements  and  checking  circuitry.  When  the  complete  teat  art  for  the 
Exclusive  ORs(XORa)(Figure  1)  is  applied  to  a  basic  cell  its  output 
signals  match  the  input  signals.  Therefore,  if  the  circuit  is  error-free 
the  output  of  a  chain  is  a  delayed  version  of  the  input  pattern  and  the 
error  output  signal  (LAE)  is  inactive. 

The  experimental  conditions  were  as  follows:  Voltage  dip  (nega¬ 
tive  pulses)  were  injected  in  the  power  rails  of  the  detector  circuits 
while  the  complete  test  set  was  repeatedly  applied  to  its  inputs  and 
the  error  output  (LAE)  observed  with  an  oscilloscope.  For  a  given 
opr -sting  frequency  the  magnitude  of  the  disturbances  increased  until 
the  first  error  was  observed.  Only  dc  disturbances  were  used;  pulse 
duration  longer  then  device  propagation  delay.  This  is  a  reasonable 
simplification  since  short  power  supply  disturbances  ( 1 00ns  or  shorter) 
are  very  rare4. 

Figure  2  shows  the  depndency  of  disturbanre  magnitude  (A  Vqd) 
on  dock  frequency  for  the  three  circuits:  the  CMOS  gate  array  (detec¬ 
tor  chip)  and  the  CMOS  (CMOS  breadboard)  and  LSTTL  (LSTTL 
breadboard) discrete  versions.  The  nominal  supply  voltage  is  5V. 


•This  protect  was  supported  in  part  by  International  Busi¬ 
ness  Machines  (IBM)  under  a  contract  with  Palo  Alto  Research 
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tics  Science  and  Technology  Office  of  the  Strategic  Defense 
Initiative  Organization  and  was  administered  through  the  Office 
of  Naval  Research  under  Contract  No.  N04014-85-K-0600.  The 
logic  aimuitatlon  was  performed  on  a  Megatogictan  workstation 
made  Posable  hy  Daisy  Systems  Co.  (Mountain  View.  CA). 

••Fabricated  by  Storage  Technology  Corp. 
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One  ran  itlmf  ify  two  rrginiis  ill  t hr  plot  (/  )  a  flat  region  fur  lower 
frequencies  (below  2.5MII/.  for  the  gale  array )  and  (2)  a  frequency 
dependent  region  (above  2.5MHz  for  I  hr  gate  array)  whrre  the  toler¬ 
ance  lu  disturbances  decreases  at  the  clock  (requeue)  increases.  The 
three  curves  exhibit  Ihe  same  behavior:  the  CMOS  bread  board  has  a 
small  flat  region,  noticeable  in  an  expanded  plot.  The  shape  of  Ihe 
curve  depends  on  Ihe  variation  of  delay -per-gale  with  supply  voltage 
and  length  of  the  longest  delay  path  in  the  system. 

For  low  frequencies  (flat  region),  noise -immunity  problems  are 
one  of  Ihe  causes  of  the  observed  errors.  In  the  frequency  -dependent 
region,  delay  efferls  arc  dominant.  Errors  occur  because,  during  the 
disturbances,  gate  propagation  delay  increases  and  liming  constraints 
arc  violated.  The  increase  in  propagation  delay  with  lower  supply 
voltages  occurs  in  CMOS  and  LSTTL  logic  and  was  discussed  earlier5 

In  the  gate  array  experiment,  internal  nodes  could  not  lie  observed 
directly.  To  confirm  that  errors  were  caused  by  delay  effects,  a  logic 
simulation  model  of  the  gale  array  was  built.  All  waveforms  obtained 
experimentally  were  successfully  reproduced  by  the  logic  simulation. 
This  confirms  tllal  transient  errors  due  to  power  supply  disturbances, 
in  the  frequency-dependent  region,  can  be  modeled  asdelav  faults. 

-  In  the  breadboard  experiments,  it  was  possible  to  observe  the 
error  mechanism  directly.  Increasing  disturbances  caused  signal  transi¬ 
tions  at  the  input  of  Tri-reg  ( Figure  I  )  to  approach  the  clock  edge . 

An  error  was  detected  by  I.AK  when  set-up  lime  was  violated.  This 
observation  led  to  Ihe  conjecture  that  metaslability  can  occur. 

A  metaslability  detector  was  built  (late  transition  deirclor)  and 
melaatahle  stales  lasting  Rthis  or  shorter  were  delected  This  shows 
that  power  supply  disturbances  can  cause  metaslability  (even  in  a 
fully  synchronous  circuit). 

In  another  experiment  with  the  LMDS  breadboard,  the  length  ol 
Ihe  longest  delay  path  in  the  circuit  was  changed  by  reducing  the 
number  of  basic  cells  in  the  rhain  (Figure  I )  from  10  to  l.  The  shape 
or  Ihe  curve  (not  shown  here)  varied  accordingly  and  l lie  tolerance  to 
auppiy  voltage  disturbance  was  higher  for  .the  shorter  path  circuit. 

One  of  tlie  consequences  of  the  power-supply/drlay  dependency 
is  that  computer  systems  designed  with  light  liming  will  loleratr  only 
very  mall  power  supply  disturbances.  As  an  example,  suppose  the 
CMOS  breadboard  is  operating  with  a  10%  delay  maigin  at  f>  V  /*>..TIklllx. 
It  will  tolerate  a  maximum  voltage  dip  of  0.5V  ;  Figure  2.  This  toler¬ 
ance  may  be  reduced  further  by  othrr  factors  ;eqt„  parameter  varia¬ 
tions,  temperature  variation,  noise,  etc. 

In  applications  where  speed  is  not  critical,  the  lowest  dock  fre¬ 
quency  lu  meet  the  specifications  should  he  used,  thereby  improving 
the  tolerance  to  power  disturbances.  Digital  systems  used  for  process 
control  in  industrial  plants  ate  examples  of  such  systems.  When 
general-purpose  computer  systems  are  used  they  are  exposed  to  unne¬ 
cessary  risk  because  usually  Ihe  clock  frequency  is  factory  preset  to 
obtain  high  performance  (light  timing). 

The  rxprnmrrital  results  and  conclusions  presented  can  lie  extended 
to  other  logic  families  with  power-supply-voltage/propagalinn -delav 
dependency. 
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ABSTRACT 

This  paper  describes  a  design  technique  which  facilitates 
testing  tor  stuck -open  faults  in  CMOS  VLSI  circuits  with  scan-path.  In 
this  technique,  the  combinational  circuitry  is  implemented  with 
specialty  designed  gates  which  can  be  tested  with  a  simplified  2- 
pattem  test  for  stuck-open  faults.  The  simplified  2-pattem  lest 
cannot  be  invalidated  by  arbitrary  circuit  delays  and  It  can  be  appSed 
through  the  scan-path  by  spedaly  designed  shift  register  latches. 

1  INTRODUCTION 

Due  to  advances  bt  VLSI  technology,  hundreds  end 
thousands  of  devices  can  be  fabricated  on  an  1C  chip.  A  VLSI  circuit 
is  extremely  difficult  to  test  due  to  Its  high  device-to-pln  ratio.  To 
alleviate  this  testing  problem.  I  is  advantageous  to  use  a  Design  For 
Testability  (DFT)  technique  during  the  design  of  VLSI  circuits. 

One  example  of  DFT  Is  the  scan-path  technique  which 
facilitates  1C  testing  In  two  ways:  (1)  scan-paths  partlion  an  1C  into 
several  blocks  which  can  be  tested  independently,  and  (2)  scan-path 
design  transforms  a  sequential  circuit  into  a  combinational  ckoutt  tor 
which  Automatic  Test  Pattern  Generation  (ATPG)  is  relatively 
straighdorward  (McCluskey  84].  However,  certain  faults  In  a  CMOS  1C 
may  cause  a  combinational  circuit  to  exhtoft  aequential  behavior,  due 
invalidating  the  sequentiaMo-combinational  transformation  of  the 
scan-path  design.  Such  a  fault  causes  an  FET  to  remain  non¬ 
conducting  irrespective  of  the  applied  volage  level  at  the  FET  gate 
terminal  and  ia  called  an  FET  ttuck-optn  fault  (sop  fault) 
(Wadsack  781. 

The  test  for  a  sop  fault  in  a  CMOS  logic  gate  consists  of  two 
test  patterns:  an  initializing  input  pattern  and  a  lest  input  pattern.  The 
rubakzioQ  input  (T1)  places  the  output  of  a  faulty  CMOS  logic  gale  at 
logic- 1  to  detect  an  NFET  sop  fault  or  at  logic-0  to  detect  an  PFET 
sop  fault  The  fast  inputfTj)  then  sensitizes  the  effect  of  toe  laud  to 
the  output  node  of  the  logic  gate  and  propagates  this  effect  to  an 
observable  output  The  generation  of  a  2-pattem  test  is  a  nontrivial 
task. 

(Jain  83)  and  (Reddy  83]  have  reported  that  arbitrary  delays 
in  the  Circuit  Under  Test  (CUT)  can  Invalidate  a  2-pattem  lest. 
Generation  of  hazard-tree  test  patterns  which  are  able  to  detect  sop 
faults  even  under  arbitrary  delay  condftions  Is  required  (Reddy  64). 
Not  only  are  test  patterns  of  this  nature  much  mors  difficult  to 
generate  than  studi-at  laul  patterns,  but  they  are  almost  impossfele 
to  apply  to  a  scan-path  1C.  This  is  because  the  process  of  shifting  in 
the  Tj  pattern  wfl  after  the  stale  of  the  CUT  established  by  the  T, 
pattern. 

Several  researchers  proposed  design  techniques  as  solutions 
to  the  problem  of  testing  lor  sop  I  suits  (McCluskey  61:  Reddy  83:  Jha 
85: 2asto  85).  However,  none  addressed  tie  problem  of  testing  sop 
laufts  tor  scarvpeth  ICs  adequately.  This  paper  presents  a  soNHon  to 
this  problem.  It  is  based  on  a  testable  CMOS  oombinaiionaf  circuft 
design  technique. 


2  A  TESTABLE  CMOS  LOGIC  CIRCUIT  DESIGN 

This  section  describes  a  technique  for  designing  testable 
CMOS  combinational  circuits.  The  proposed  technique  is  based 
upon  two  assumptions  about  the  CUT.  RrsL  ft  assumes  that  there  is 
only  one  sop  fault  in  the  CUT.  Second,  the  combinational  part  of  the 
'  CUT  does  not  contain  transmission  gates. 

2.1  A  fully  complementary  gate  structure 

A  CMOS  combinational  circuit  is  constructed  by 
Interconnecting  CMOS  gates.  Figure  1  shows  a  block  diagram  of  a 
CMOS  gate  which  consists  of  a  pul-up  network  of  PFETs  Oner;  and 
a  pul-down  network  of  NFETs  (n-net).  Most  CMOS  gates  used  in  a 
logic  circuit  are  what  may  be  referred  to  as  fully  complementary  gates. 
These  gates  are  defined  as  follows. 

Vdd 


figure  t-  Blodt  diagram  of  a  CMOS  gate. 


[Definition  1]  A  Fully  Complamantary  Gate  (FCG)  is  a  CMOS  gate 
with  the  folowing  properties:  (1)  each  logic  gate  input  is  connected 
to  the  transistor  gate  terminals  of  both  a  PFET  and  an  NFET.  and  (2) 
the  p-net  provides  conduction  paths  for  all  input  combinations  for 
which  output  Z  is  logic-1 :  the  n-net  provides  conduction  paths  tor  aft 
Input  combinations  lor  which  output  Z  is  logic-0  [Mukherjee  86). 

An  example  of  an  FCG  is  shown  in  Fig.  2.  It  resizes  an  AOI 
gate.  Notice  that  an  FCG  can  realize  any  combinational  Boolean 
tonction  I  double-ral  inputs  are  avaftabie. 


figure  2.  An  AOI  gate. 


[Property  1]  (FCG  irwiaRzation) 

Th#  output  nod*  2  ol  an  FCG  can  b*  MUaRzed  to  taglc-0  H  al  of  Rs 
inputs  a r*  sat  to  logic- 1.  Similarly,  th«  output  Z  can  b*  initialized  to 
logic- 1  M  all  th*  inputs  ar*  sat  to  logic-0.  (During  th*  Initialization,  It 
both  x  and  its  oompi*m*nt «'  ar*  inputs  to  an  FCG,  they  at*  assigned 
th*  sam*  logic  value.) 

Normally  I  is  not  possbi*  to  s«<  both  input  i  and  s'  to  th*  sam* 
logic  value  unless  they  ar*  primary  Inputs.  However.  such  an  input 
condition  can  b*  established  In  a  testable  combinational  circuit  to  b* 
deserfeed  later  in  this  section. 

2.2  Stuck-open  fault  test  patterns 

To  detect  a  sop  laud  in  an  FCG  requires  two  patterns.  The 
initializing  input  T^  can  be  easily  determined  using  property  on*. 
The  test  input  T2  is  derived  by  assigning  values  to  each  input 
variable  such  that  It  can  (1)  turn  on  the  faulty  FET  (P(),  (2)  turn  on 
enough  FET*  so  that  a  conduction  path  from  Vdd  through  P)  to  th* 
output  exists  i  the  I  auk  is  in  the  p-net,  or  a  conduction  path  from  th* 
output  through  P|  to  ground  exists  when  th*  fault  is  in  th*  n-net,  and 
(3)  turn  off  enough  FETs  so  that  no  conduction  path  exists  which 
does  not  include  Pj  (EKZig  81],  This  type  of  2 -pattern  test  uses  an 
al-i  pattern  or  an  al-0  pattern  as  the  Tj  input,  hence  1  wVbecaleda 
#»rp«5#d  2-pUtun  test 

Th*  next  two  examples  Must  rate  the  derivation  ol  simplified  2- 
pattern  tests  lor  FCGs.  The  first  example  is  an  AOI  gate,  and  the 
seoond  is  an  8 -input  complex  gat*. 

(Example  1]  Find  a  2 -pattern  test  for  FET  P2  sop  fauR  in  the  AOI 
gat*  shown  in  Fig.  2. 

This  drcul  Is  taken  from  (Jain  83].  In  that  paper,  this  drcuR  is 
used  to  Austral#  that  some  2 pattern  lasts  may  be  invalidated  due  to 
timing  skews  in  muRiple  input  changes  (e.g.Tj-1  0  0  and  T2-0  0  1). 
Th*  authors  proposed  a  2-pattern  test  (l.e.  Tj-0  1  1  and 
Tj-O  0  1 )  in  which  only  one  input  makes  a  transition. 


A  test  lor  PS  eop  taul. 
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Although  this  test  Involves  multiple  input  transitions,  ks  validity 
stffl  holds  in  the  presence  of  arbitrary  delays.  There  are  tour  inputs 
(A,  B\  C,  D')  which  are  not  changed  in  this  2-pattem  test.  These 
inputs  block  all  tht  conduction  paths  between  and  Z.  except  lor 
the  path  consisting  of  FET  PS  and  P6.  Therefore,  node  Z  cannot  be 
accidentally  changed  to  logic-1  before  the  T2  input  is  applied  to  the 
circuit. 


Th#  following  theorem  formaRy  states  the  validity  of  simplified 
2-pattem  tests  under  arbitrary  drcuR  delays. 


In  our  approach,  the  Tt  Input  Is  the  aR-1  pattern  because  th* 
faulty  FET  P2  I*  In  the  p-net.  Th*  T2  Input  sets  A  to  logic-0.  B  to 
logic-0  and  C  to  logic- 1.  This  2-pattem  test  contains  two  input  (A  end 
B)  changes.  While  changing  both  A  and  8  to  0.  si  conduction  path 
between  and  Z  i*  established  to  provoke  th*  fault.  It  i*  apparent 
that  th*  order  ol  these  two  input  transitions  win  not  affect  the 
detectability  of  the  P2  aop  fault 


A  test  tor  P2  *op  taut. 
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(Example  2]  Find  a  2 -pattern  test  tor  FET  PS  sop  fauR  In  th*  drcuR 
ol  Fig.  3. 

(Reddy  83]  presented  this  circuit  to  demonstrate  that 
because  of  arbitrary  delays  no  2-pattem  test  exists  tor  th*  PS  sop 
fauR.  However.  R  every  input  variable  and  Rs  complement  can  be  set 
to  logic- 1  at  th#  same  time,  a  valid  2-pattem  lest  tor  this  (auk  can  be 
derived. 

To  detect  th*  PS  sop  fauR,  a  simplified  2-pattem  lest  u  Rated 
below  is  applied  to  th*  complex  gate.  This  2-pattem  test  contains 
tour  input  transition*  input  B  and  0  changes  have  no  effect  on  th# 
outfM  nod*  Z.  Input  A'  and  C  changes  prow#  th*  fauR. 


Ktram  1]  H  an  FCG  bnpiemantatton  of  a  Boolean  function  is 
ndant.  then  for  each  of  Re  FET  top  faults,  there  exists  a 
simplified  2-pattem  lest  which  cannot  be  invalidated  by  arbitrary 
drcuR  delays. 

Proof:  Assume  the  faulty  transistor  in  an  FCG  is  th*  PFET  Pf 
controled  by  th*  input  variable  X|.  Since  the  fauR  la  In  th*  p-net.  th* 
Tj  input  of  th*  sknpSfied  2-pattem  test  is  the  *1-1  pattern.  Th#  T2 
Input  can  be  derived  using  the  method  of  Boolean  difference.  First, 
find  th*  transmission  T  of  tha  p-nat.  Second,  find  th*  Boolean 

dKterence  of  T  with  respect  to  X|.  Let  dT/dxj  •  f(xj . xn).  Since  th* 

FCG  Is  Irrtdundark.  Rs  transmission  T  wfll  not  be  vacuous  in  xj  A  T2 
Input  can  be  found  by  assigning  logic-0  to  input  variable  xj  and 
assigning  proper  logic  value  to  other  input  variables  such  that 
•  1.  (H  input  variable  *|  controls  another  FET  P„  in  the  p- 
net,  a  T2  input  can  stil  be  derived  by  assigning  value  to  other  input 
variables  so  that  a  conduction  path  through  P(  wil  not  exist.)  This 
test  input  wfli  be  called  TP  and  R  wd  establish  one  or  more  conduction 
paths  from  Vdd  through  Pj  to  the  output  nod*.  Let  these  paths  be 
collect vefy  cased  path  A. 

Comparing  pattern  T^  to  T2.  there  may  be  multiple  input 
variables  that  change  from  logic-1  to  logic-O  to  turn  on  PFETs  In  the 
FCG.  Among  alt  th*  input  variables  making  transkions.  some  control 
FETs  in  condudion  paths  other  than  path  A.  These  are  classified  *s 
category  on*  input  variables.  The  other  input  variables  belong  to 
category  two. 


Category  on*  input  variables  ar*  set  to  logic-0  because  their 
complement  Inputs  ar*  set  to  logic-1  within  pattern  TP  so  as  to  block 
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other  conduction  paths  in  the  p-net.  Category  one  input  changes 
cannot  establish  a  conduction  path  between  Vdd  and  the  output 
node.  Otherwise,  TP  is  not  a  valid  test  pattern  lor  the  P(  sop  fault. 

Category  two  input  variables  will  turn  on  every  PET,  except  the 
faulty  P(,  in  path-A.  If  a  subset  of  these  input  variables  will  establish  a 
different  conduction  path  C  in  the  p-net.  then  these  input  changes 
may  Invalidate  the  2-pattem  test.  However,  if  path  C  exists  In  the  p- 
net  then  it  wilt  make  path  A  redundant.  Therefore,  no  other 
conduction  path  can  exist  because  the  circuit  is  irredundant.  As  a 
result,  even  If  the  2-pattem  test  consists  of  multiple  input  changes, 
the  order  of  input  Iransitons  will  not  affect  the  validity  of  the  test. 

Similarly,  the  test  tor  an  NPET  sop  fault  can  be  proved  to  be 
valid  under  arbitrary  delays. 

Q.E.D. 

2.3  A  testable  combinational  circuit 

A  combinational  circuit  can  be  constructed  by  interconnecting 
PCGs.  However,  a  sop  fault  in  this  type  of  circuit  is  difficult  to  test 
because  the  embedded  PCGs  of  this  circuit  can  not  be  properly 
initialized  by  using  an  a#-1  or  an  all-0  pattern  at  the  circuit  inputs. 
However,  if  an  inverting  buffer  is  added  to  every  FCG  which  fans  out 
to  other  PCGs.  we  can  easily  initialze  an  embedded  PCG  by  setting  alt 
the  circuit  inputs  to  logic-1  or  logic-0.  This  observation  leads  to  the 
concept  of  a  testable  gate  and  a  testable  combinational  circuit. 

[Definition  2]  A  Tttlabl*  Gate  (TG)  consists  of  a  fully 
complementary  gate  connected  to  an  Inverting  Buffer  (IB). 

Figure  4  shows  a  block  diagram  of  a  TG.  It  is  interesting  to 
note  that  any  sop  fault  in  the  IB  can  be  detected  by  toggling  the  I8's 
input.  This  implies  that  applying  a  simplified  2-pattem  test  to  detect  a 
sop  fault  in  the  n-net  wM  toggle  the  input  of  the  IB  so  that  sop  fault  In 
PET  P|g  wil  be  detected  as  well.  Therefore,  It  is  not  necessary  to 
develop  separate  test  patterns  for  sop  fauls  in  the  IB. 
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Figure  4.  A  testable  gate  structure. 

For  a  TG  In  general,  assume  TP  is  the  test  Input  T2  lor  a  sop 
fault  in  the  p-net,  and  TN  is  the  test  input  Tj  lor  a  sop  fault  In  the  n- 
net.  A  2-pattem  test  set  for  detecting  a  sop  fault  in  each  part  of  a  TG 
is  listed  in  Table  1. 


Table  1.  A  2-pattern  test  sst  ter  a  TG. 
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(Definition  3]  A  Testable  Combinational  Circuit  (TC)  is  a  multi¬ 
level  combinational  circuit  which  consists  of  two  types  of  logic  gates: 
testable  gates  and  fuly  complementary  gates.  Testable  gates  are 
used  In  alt  but  the  last  level  of  the  circuit.  Fuly  complementary  gates 
are  used  in  the  last  level  of  the  circuit. 


[Property  2)  (TC  initialization) 

If  all  the  inputs  of  a  TC  are  set  to  logic-0,  then  at  of  its  logic  gate  inputs 
are  set  to  logic-0.  Likewise,  if  alt  the  Inputs  of  a  TC  are  set  to  logic-1 . 
then  alt  of  Its  logic  gate  inputs  are  set  to  logic-1.  (During  the 
initialization  of  a  TC,  If  both  x  and  its  complement  x‘  are  inputs  to  the 
TC.  they  are  assigned  the  same  logic  value.) 

Due  to  the  above  property.  It  is  straightforward  to  generate  the 
T^  Input  for  any  logic  gate  embedded  in  a  TC.  This  TC  design 
technique  resembles  the  CMOS  do  mine  circuit  technique  [Krambeck 
82).  While  a  domino  circuit  offers  the  speed  advantage  over  a 
conventional  static  CMOS  circuit,  a  TC  provides  the  testing 
advantage  over  a  conventional  static  CMOS  circuit. 

Since  the  IB  of  every  TG  adds  overhead  to  a  TC  design,  it  is 
worthwhile  to  Investigate  the  impact  of  such  a  design  technique  in 
terms  ol  circuit  area  and  speed.  A  4-bit  adder  circuit  (FA4 
macrofunction)  described  in  [LSI  85]  is  used  as  an  example. 

First,  we  examined  the  area  overhead  by  comparing  the  FET 
counts  of  each  design.  The  testable  design  uses  8%  more  FETs 
than  the  original  design.  This  number  is  much  smaller  than  the 
overhead  figure  of  converting  a  conventional  gate  into  a  TG  (<50%). 

-  This  is  because  the  last  level  of  a  TC  consists  of  conventional  gates 
which  do  not  introduce  any  area  overhead.  Also,  the  original  design 
uses  5  inverters  which  are  not  required  in  the  testable  design. 

To  evaluate  how  the  testable  design  can  affect  the  circuit 
performance,  the  critical  paths  of  two  adder  designs  are  compared. 
These  paths  are  shown  in  Fig.  5.  The  original  design  has  six 
conventional  gates  in  ti  longest  path.  The  same  path  in  the  testable 
design  contains  the  equivalent  of  nine  conventional  gates.  Using 
device  parameters  from  the  Stanford  University's  Center  for 
Integrated  Systems  2um  CMOS  process.  SPICE  simulations  show 
negligible  differences  in  the  critical  path  delay  of  the  two  designs. 
This  is  because  the  IBs  increase  the  drive  capability  of  each  TG  in  the 
critical  path. 


Figure  5.  The  citical  path  of  a  4-b*  adder. 

(a)  LSI  logic  design  (b)  testable  design. 

3  APPLICATION  TO  SCAN-PATH  ICe 

Many  VLSI  circuits  are  designed  with  the  scan-path  technique 
This  technique  requires  that  every  memory  element  in  a  sequential 
circuit  be  connected  as  one  or  more  shift  registers  (i  e.  the  scan- 
paths).  The  scan-paths  can  be  easily  tested  independent  of  the 
sequential  circuit  configuration.  The  remaining  combinational 
circuitry  is  then  tested  by  applying  test  patterns  through  the  scan- 
paths.  However,  to  detect  a  sop  fault  in  the  testable  combinational 
circuit,  It  is  not  necessary  to  shift  two  patterns  into  the  scan-path 
The  Ti  input  pattern  can  be  generated  by  the  scan-path  itself. 

For  example,  in  the  LSSD  scheme,  the  scan-path  is  made  out 
of  Shift  Register  Latches  (SRL)  (Eichelberger  77).  A  CMOS  SRL 


m  k/v  ijv  l 


implementation  which  is  capabt*  ot  generating  ih#  T  y  input  is  shown 
in  Fig.  6.  Two  control  inputs  (P  and  G)  srs  added  to  ths  SRI.  During 
normal  mods,  ths  P  input  works  ss  ths  power  Ins.  and  ths  G  Input  as 
ths  ground  fats.  Ths  latch  outputs  (O  and  Q1  taka  on  complementary 
values.  Ouring  test  mods,  I  both  P  and  Gars  sat  to  loglc-1.  ths  SRL 
outputs  a  re  forced  to  logic-1  and  ths  scan-path  generates  an  aS-1 
pattern.  Similarly.  N  both  P  and  G  ars  logic-0,  ths  scan-path 
generates  an  al-0  pattern  lor  the  CUT. 

A  procedure  to  apply  the  sknpified  2 -pattern  test  through  the 
scan-path  is  described  below. 

1.  (shllt)  Apply  the  appropriate  number  ol  A.  B  dock  pulses  to 

the  scan-path  so  that  the  Tg  input  pattern  is  shifted  in  and 
stored  in  the  latches. 

2.  (T1  Input)  Activate  and  hold  the  B  dock  Ine  at  logic-1.  Set  P 

and  G  to  logic-1  such  that  an  all-1  pattern  appears  at  the 
inputs  to  the  CUT.  or  set  P  and  G  to  logic-0  if  an  aS-0  pattern 
is  required. 

3.  (T2  Input)  Set  P  to  logic-1  and  G  to  logic-0,  then  the  Tg 

Input  pattern  wil  appear  at  the  CUT  inputs.  Activate  the  C 
dock  fate  once  to  strobe  the  CUT  response  into  the  scan- 
path. 


Figure  S.  A  CMOS  Shill  register  latch  design  ter  testabMy. 


4  SUMMARY  AND  CONCLUSIONS 

Testing  lor  CMOS  sop  faults  in  a  scan-path  1C  is  difficult 
because  of  the  following  problems:  (1)  Detection  of  a  sop  fault 
requires  the  application  of  two  test  patterns  to  the  circuit.  (2)  Arbitrary 
delays  In  the  CUT  can  invafidate  a  2-pattem  test.  (3)  It  ie  almost 
impotable  to  apply  the  2-pattem  tests  through  a  scan-path. 

This  paper  describes  a  design  technique  which  can  facilitate 
the  testing  of  sop  faults  in  a  scan-path  1C.  The  technique  has  the 
blowing  features. 

(1)  It  is  based  on  a  testable  combinational  drcuk  design. 

(2)  Every  logic  gate  in  the  testable  circuit  can  be  tested  with  a 
simplified  2-pattem  test  set  lor  its  sop  lautt. 

(3)  This  test  set  cannot  be  invafidated  by  arbitrary  ckcut  delays. 

(4)  For  scan-path  ICs.  the  simplified  2-pattem  test  can  be  easily 
applied  through  the  scan-path  by  specially  designed  shift 
register  latches . 

A  4-bit  adder  is  designed  according  to  the  proposed 
technique.  The  testable  design  uses  8%  more  FETs  than  the 
original  design  and  imposes  no  performance  penalty. 
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